Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions

نویسندگان

  • Kristin K Nicodemus
  • Wenyi Wang
  • Yin Yao Shugart
چکیده

Risk of complex disorders is thought to be multifactorial, involving interactions between risk factors. However, many genetic studies assess association between disease status and markers one single-nucleotide polymorphism (SNP) at a time, due to the high-dimensional nature of the search space of all possible interactions. Three ensemble methods have been recently proposed for use in high-dimensional data (Monte Carlo logic regression, random forests, and generalized boosted regression). An intuitive way to detect an association between genetic markers and disease status is to use variable importance measures, even though the stability of these measures in the context of a whole-genome association study is unknown. For the simulated data of Problem 3 in the Genetic Analysis Workshop 15 (GAW15), we examined the variability of both rankings and magnitude of variable importance measures using 10 variables simulated to participate in gene x gene and gene x environment interactions. We conducted 500 analyses per method on one randomly selected replicate, tallying the rankings and importance measures for each of the 10 variables of interest. When the simulated effect size was strong, all three methods showed stable rankings and estimates of variable importance. However, under conditions more commonly expected to be encountered in complex diseases, random forests and generalized boosted regression showed more stable estimates of variable importance and variable rankings. Individuals endeavoring to apply statistical learning methods to detect interaction in complex disease studies should perform repeated analyses in order to assure variable importance measures and rankings do not vary greatly, even for statistical learning algorithms that are thought to be stable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In-silico study to identify the pathogenic single nucleotide polymorphisms in the coding region of CDKN2A gene

Background: CDKN2A, encoding two important tumor suppressor proteins p16 and p14, is a tumor suppressor gene. Mutations in this gene and subsequently the defect in p16 and p14 proteins lead to the downregulation of RB1/p53 and cancer malignancy. To identify the structural and functional effects of mutations, various powerful bioinformatics tools are available. The aim of this study is the ident...

متن کامل

A comprehensive in silico analysis of pathogenic nsSNPs in the NT5C2 gene involved in relapsed ALL

Background: About 10-20% of children suffering from acute lymphoblastic leukemia (ALL), experience a relapse, which is a major cause of their death. Purine nucleotide analogs are frequently prescribed to maintain the treatment of ALL. Cytosolic 5´-nucleotidase (NT5C2) catalyzes the 5´ dephosphorylation of purine analogs. Gain-of-function mutations in the NT5C2 gene result in resistance to the t...

متن کامل

P-202: StuI Polymorphism on the Androgen Receptor Gene in Women with Endometriosis

Background: Androgens have an anti-proliferative effect on endometrial cells. Human androgen receptor (AR) gene contains two polymorphic short tandem repeats of GGC and CAG, and a single-nucleotide polymorphism on exon 1 that is recognized by the restriction enzyme, StuI. Prior studies have shown that the lengths of the CAG and GGC repeats are inversely and linearly related to AR activity and a...

متن کامل

The Single Nucleotide Polymorphisms in the C-reactive Protein Gene: are they Biomarkers of Cardiovascular Risk?

Recent pre-clinical and clinical studies have revealed the C-reactive protein gene (CRP) is related to the degree of acute rise in plasma C-reactive protein (CRP) levels. Moreover, single nucleotide polymorphisms (SNPs) in the CRP gene could associate with increased risk of cancer, atherosclerosis, diabetes mellitus, bowel disease, rheumatoid arthritis, psoriasis, obstructive pulmonary disease,...

متن کامل

Impact of Genetic Variants in Mir-122 Gene and its Flanking Regions on Hepatitis B Risk

MicroRNAs are small non coding RNAs that are involved in gene expression regulation. Mir-122 was reported to inhibit hepatitis B virus (HBV), but little is known about the role of mir-122 polymorphisms on HBV infection development. This present study aimed to investigate the association between single nucleotide polymorphisms (SNPs) in mir-122 gene region with HBV infection. Study cases were HB...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • BMC Proceedings

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2007